-----------------------------------------------------------------
December 5th 2002 Version 2 FDAM1-2.4-021205
-----------------------------------------------------------------

INTEGRATORS:

Yi-Shin Tung (tung@cmlab.csie.ntu.edu.tw), National Taiwan University.

-----------------------------------------------------------------
REASON FOR UPDATE:
Major Changes of the Newly Integrated Version

1. The new version provides the full features of tools provided in both the second version of MPEG-4 visual specification (14496-2:2001) and its second amendment (14496-2:2001/Amd 2).
2. Change the input parameter to delete "-fgs" for FGS decoding. It can automatically judge the profiles the input bitstreams belong to. All simple scalable, core scalable and streaming video profiles take the same parameter types and orders.
3. For rectangular-object video streams, the width and height parameters can be set to 0 if lacking of the knowledge about the decoded bitstream.
4. Change the decoded-image-file parameter to be specified without ".yuv" for VTC decoding.
5. Support decoding a single FGST enhancement bitstream.
6. Integrate the error resilient tools to the scalable enhancement layer, and they can be enabled by using the directive "_ERSSP_".
7. Fix most of the problems listed in the following.

-----------------------------------------------------------------
Problem Descriptions
The following bugs are solved in the new release (refer to the Visual Problem Reported document). 

1. There is a question about the current implementation for the impact of brightness_change_factor. According to the specification in section 7.8.6 of 14496-2:2001, this factor is defined for the Y channel for static sprite decoding. However, in the reference software one could not find where it is used during decoding neither in the Microsoft nor in the MoMuSys implementations. The flag (brightness_change_factor) is encoded to and decoded from the stream, but all the subsequent parameters are not handled and the flag is ignored during sample reconstruction. (Reported in VP 4.1)
2. There is a bug about wrong indexing by the wrong VisualObject_ID in "CSessionEncoder::encodeVideoObject". (Reported in VP 4.3)
3. The MS reference software did not clip the DC coefficient as specified in subclause 7.4.3.4 of 14496-2:2001 and as done by the MoMuSys reference software. Although it is a fairly rare case, compression of real data may exercise this bug and result in some noticeable artifacts, because of this problem, when the prediction direction (used for both AC and DC coefficients) of the decoder is different from that of the encoder. (Reported in VP 4.15)
4. According to Section 7.4.3.1 of 14496-2:2001, the AC/DC_prediction should be determined by the coefficient F[0][0]. F[0][0] is the value after the saturation process. However, in the current MS software, the AC/DC_prediction is determined by the coefficients just after the IQ process but before the saturation process. This bug might cause a mismatch in the AC/DC_prediction between the MS software and the MoMuSys Software. The clipping operation is necessary for being applied to both CVideoObject::inverseQuantizeIntraDc() and CVideoObject::inverseQuantizeDCTcoefH263(). Note that encoding tools can ensure these problems never cause them much grief by restricting the dynamic range of the intra DC coefficients very strictly in the encoder.  This avoids the prescribed mismatch that happens when the DC/AC coefficient prediction direction goes wrong. (Reported in VP 4.17)
5. The average value of motion vectors within a GMC macroblock is calculated, and is then treated as the motion of that macroblock. According to the specification, if the average motion vector is outside the motion vector range specified by f_code, it is clipped into the range. In the quartersample mode, the vector is calculated in the quarter-pel unit; otherwise, it is in the half-pel unit. However, in both cases the motion vector range should be the same (although in different units) no matter the quartersample flag is on or off. However, in the current MS reference software, the range is divided by 2 in the quartersample mode (gmc_motion.cpp: FastAffineWarpMotion()). It is a wrong implementation. (Reported in VP 4.18)
6. There is a bug about the calculation errors of TRD and TRB under the interlaced direct_mode case. When calculating TRD[i] and TRB[i], the first B-VOP interval, Tframe, has to be involved in the process according to the specification. However, the current implementation did not take this into consideration. (Reported in VP 4.19)
7. The reference encoder has a bug in the creation of resync marker for B-VOP. The current encoder generates a resync marker of minimal 16 zeros instead of 17. (Reported in VP 4.21)
8. In the decodeVOPHead(), when vop_coded is false, m_vopmd.bInterlace is incorrectly set to FALSE. The change of m_vopmd.bInterlace will affect the next VOP decoding in the interlaced mode. (Reported in VP 4.24)
9. One more element is needed in the arrays of m_rgmbmd and m_rgmbmdRef for data-partitioning P-VOP in MS reference software. This bug will potentially result in the wrong decoding results or causing the program to crash. We found that the original decoder will crash when decoding vcon-ge18-ACEL3.bits.
10. A lot of memory blocks allocated during encoding and decoding processes are left there. The garbage collection codes are inserted to free those memory blocks. This helps to consolidate the codes.
11. The reference bitstream "vcon-ge18-ACEL3.bits" (14496-4:2002) contains the mode of not_coded I-VOP. Now, this mode is processed as not_coded P_VOP.
12. Enlarge clipping table in CBlockDCT() to correctly decode mat054.m4v (14496-4:2000), and a check is added before applying the clipping table.
13. Enlarge clipping table in CVideoObject::setClipTab() to correctly decode vcon-ge2.cmp,vcon-ge12.cmp (14496-4:2000)
14 The segmentation file will not be created or set correctly when both the base and temporal enhancement layers are non-rectangular objects. It happens in some of the conformance testing bitstreams in 14496-4:1999, such as vcon-scc2.cmp and vcon-scc2_e.cmp.
15 The reconstructions of the base-layer decoding by the FDAM version of MS FGS reference software are sometime incorrect, and these errors also propagate to the FGS layer. The integrated version now fixes this problem, and therefore, the decoding result is correct and the same as that of the finalized version 2 reference software. The FGS conformance bitstreams, i.e., a3fgs-17-L0-fgs.cmp, a3fgs-17-L1-fgs.cmp, a3fgs-17-L2-fgs.cmp and a3fgs-17-L3-fgs.cmp, are, thereafter, reconstructed correctly using the integrated version.
16 There is a bug in the main decoding loop for FGS temporal scalability. The last frame in the base layer is decoded but not dumped correspondingly.
17 There is a problem of the background composition function when both the base and enhancement layers are non-rectangular objects. The composition should check not only the current mask but also the previous mask to correctly set the background value. The conformance bitstreams of 14496-4:1999, vcon-scc3.cmp and vcon-scc3_e.cmp encounter this problem. The background of the enhancement reconstruction is filled with the padding pixels of previous or future background frame (the base-layer reconstruction). However, in practice, these pixels should be transparent.

-----------------------------------------------------------------
Other Known Problems

The following known problems are existed in the current MS reference software. These problems may be some unsupported functionalities or syntax. They should be augmented or refined if they become practical or requested by some applications.
1. The bitstream of the core scalable profile@Level 3 should support maximally two enhancement layers per object. However, the current implementation only supports one enhancement layer for this profile. In addition, there is no conformance bitstream provided in all versions of 14496-4, now.
2. The handling mechanism for the FGS_FGST layer, which is a combinational enhancement layer, is now not supported by the MS reference software. The conformance bitstream is also excluded in all versions of 14496-4.
3. For arbitrary-shaped objects, the width and height parameters should be provided as arguments to conduct the frame-buffer allocations. The current decoding will crash if the input width and height are smaller than what actually required. It is suggested to stop the decoding instead of crashing and reporting the minimal requirements for the frame being decoded.


-----------------------------------------------------------------
FILES CHANGED:
many

-----------------------------------------------------------------
COMMENTS:

